Enhancing Contents-Link Coupled Web Page Clustering and Its Evaluation

نویسندگان

  • Yitong Wang
  • Masaru Kitsuregawa
چکیده

Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms cannot adapt well to Web clustering directly in terms of efficiency and effectiveness. Combining contents analysis and hyperlink structure analysis has been proven a better approach. However, how to effectively combine the two features with different nature in clustering to get satisfactory results remains an open problem and there is still little work on it. In this paper, we present an experimental study on enhancing coupling of links and contents analysis of Web pages for robust clustering. In particular, we introduce two techniques: in-link reinforcement and anchor window analysis to improve the adaptability of contents-link coupled clustering. Our detailed evaluation indicates those techniques can effectively improve the quality of Web pages clustering for a wide range of topics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Navigability in Websites Built Using Web Content Management Systems

Websites built using Web Content Management Systems (WCMSs) usually provide their users with three alternative access structures to surf their contents: indexes of categories, breadcrumb trails, and sitemaps. In addition, to find contents of his/her interest, a user can perform more or less advanced full-text searches. In this paper we propose an automatic approach to extend the navigation stru...

متن کامل

A Survey Paper of Structure Mining Technique using Clustering and Ranking Algorithm

A survey of various link analysis and clustering algorithms such as Page Rank, Hyperlink-Induced Topic Search, Weighted Page Rank based on Visit of Links K-Means, Fuzzy K-Means. Ranking algorithms illustrated, Weighted Page Rank is more efficient than Hyperlink-induced Topic Search Whereas clustering algorithms has described Fuzzy Soft, Rough K-Means is a mixture of Rough K-Means and fuzzy soft...

متن کامل

A Combined Model of Supervised and Unsupervised Learning for Reliable Web Page Prediction

Web usage mining is one of the important aspects of web mining that not gives the benefit the service providers to get information about user interest in their contents or the web site, But also it helps a user to identify the best services provider. The web usage mining is about to analyze the web links respective to user interest and the frequency of use. In this work we have defined a work o...

متن کامل

Combining Link and Contents in Clustering Web Search Results to Improve Information Interpretation

With information proliferate on the web, it is far beyond human’s ability to digest this huge, heterogeneous information, e.g. locating related resources as well as providing accordingly information interpretation. While web search engine could retrieve information on the Web for a specific topic, users have to step a long ordered list in order to locate the needed information, which is often t...

متن کامل

Link-Based Clustering for Finding Subrelevant Web Pages

We propose a new Web page clustering. Typical search engines only provide relevant pages, i.e., the pages matching users’ needs. However, we design our clustering method to provide non-relevant pages as search results when they refer to relevant pages and help users anticipate the contents of those relevant pages. We call such pages subrelevant. As it is difficult to improve Web search performa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004